Fix get_zfs_sb race and misc fixes #4828

tuxoko · 2016-07-07T00:15:24Z

Don't allow accessing XATTR via export handle
Fix Large kmem_alloc in vdev_metaslab_init
Add configure result for xattr_handler
fh_to_dentry should return ESTALE when generation mismatch
Fix get_zfs_sb race with concurrent umount

don-brady · 2016-07-07T15:49:39Z

module/zfs/vdev.c

@@ -892,7 +892,7 @@ vdev_metaslab_init(vdev_t *vd, uint64_t txg)

 	ASSERT(oldc <= newc);

-	mspp = kmem_zalloc(newc * sizeof (*mspp), KM_SLEEP);
+	mspp = vmem_zalloc(newc * sizeof (*mspp), KM_SLEEP);


Should the associated kmem_free for this also be changed to vmem_free?

Yes, you're right. I probably didn't do that because it happens to be the same under the hood. But there's no guarantee it won't change.

tuxoko · 2016-07-07T18:56:28Z

There's a strange deadlock in zfstests:
http://build.zfsonlinux.org/builders/CentOS%207.1%20x86_64%20%28TEST%29/builds/1738/steps/shell_11/logs/console

It shouldn't be cause by this patch, but we need to investigate it.

Allow accessing XATTR through export handle is a very bad idea. It would allow user to write whatever they want in fields where they otherwise could not. Signed-off-by: Chunwei Chen <[email protected]>

This allocation can go way over 1MB, so we should use vmem_alloc instead of kmem_alloc. [ 135.552116] Large kmem_alloc(1430784, 0x1000), please file an issue at: [ 135.552116] https://github.com/zfsonlinux/zfs/issues/new [ 135.552125] CPU: 5 PID: 8789 Comm: zpool Tainted: P O 3.16.0-4-amd64 openzfs#1 Debian 3.16.7-ckt25-2 [ 135.552127] Hardware name: IBM System x3650 M2 -[794732U]-/49Y6498 , BIOS -[D6E128AUS-1.03]- 08/20/2009 [ 135.552129] 0000000000000000 ffffffff8150e835 0000000000000000 000000000000c210 [ 135.552133] ffffffffa0324aff ffff880fe84a8000 ffff880fe84a8000 0000000000000000 [ 135.552136] 000000000002baa0 ffff880fe89e9000 ffffffffa17d0c8d 0000000000000000 [ 135.552140] Call Trace: [ 135.552150] [<ffffffff8150e835>] ? dump_stack+0x5d/0x78 [ 135.552167] [<ffffffffa0324aff>] ? spl_kmem_zalloc+0xef/0x160 [spl] [ 135.552197] [<ffffffffa17d0c8d>] ? vdev_metaslab_init+0x9d/0x1f0 [zfs] [ 135.552216] [<ffffffffa17d46d0>] ? vdev_load+0xc0/0xd0 [zfs] [ 135.552231] [<ffffffffa17d4643>] ? vdev_load+0x33/0xd0 [zfs] [ 135.552247] [<ffffffffa17c0004>] ? spa_load+0xfc4/0x1b60 [zfs] [ 135.552264] [<ffffffffa17c1838>] ? spa_tryimport+0x98/0x430 [zfs] [ 135.552277] [<ffffffffa17f28b1>] ? zfs_ioc_pool_tryimport+0x41/0x80 [zfs] [ 135.552291] [<ffffffffa17f5669>] ? zfsdev_ioctl+0x4a9/0x4e0 [zfs] [ 135.552294] [<ffffffff811bacdf>] ? do_vfs_ioctl+0x2cf/0x4b0 [ 135.552297] [<ffffffff810852e1>] ? task_work_run+0x91/0xb0 [ 135.552299] [<ffffffff811baf41>] ? SyS_ioctl+0x81/0xa0 [ 135.552301] [<ffffffff81516a28>] ? page_fault+0x28/0x30 [ 135.552303] [<ffffffff81514a0d>] ? system_call_fast_compare_end+0x10/0x15 Signed-off-by: Chunwei Chen <[email protected]> Closes openzfs#4752

Signed-off-by: Chunwei Chen <[email protected]>

When generation mismatch, it usually means the file pointed by the file handle was deleted. We should return ESTALE to indicate this. We return ENOENT in zfs_vget since zpl_fh_to_dentry will convert it to ESTALE. Signed-off-by: Chunwei Chen <[email protected]>

behlendorf · 2016-07-12T00:36:36Z

module/zfs/zfs_ioctl.c

-	} else {
+	/* bump s_active only when non-zero to prevent umount race */
+	if (!*zsbp || !(*zsbp)->z_sb ||
+	    !atomic_inc_not_zero(&((*zsbp)->z_sb->s_active))) {


The fix here looks good but let's structure it as a == NULL check for readability.

if (*zsbp == NULL || (*zsbp)->z_sb == NULL || ...

behlendorf · 2016-07-12T00:38:28Z

@tuxoko the strange deadlock you referenced appears to have been introduced by the additional test cases enabled by the resumable recv patch stack. Thanks for proposing a fix for it in this stack.

All the changes here LGTM. I have just one minor style nit I commented on. I've also resubmitted the test runs which failed.

tuxoko · 2016-07-12T01:02:34Z

@behlendorf
Actually, I think I fixed it in the get_zfs_sb patch.
However, the failure in zfstests is probably resulted from the stuff you mentioned.

Certain ioctl operations will call get_zfs_sb, which will holds an active count on sb without checking whether it's active or not. This will result in use-after-free. We fix this by using atomic_inc_not_zero to make sure we got an active sb. P1 P2 --- --- deactivate_locked_super(): s_active = 0 zfs_sb_hold() ->get_zfs_sb(): s_active = 1 ->zpl_kill_sb() -->zpl_put_super() --->zfs_umount() ---->zfs_sb_free(zsb) zfs_sb_rele(zsb) Signed-off-by: Chunwei Chen <[email protected]>

tuxoko · 2016-07-12T18:03:30Z

updated the style in get_zfs_sb

Allow accessing XATTR through export handle is a very bad idea. It would allow user to write whatever they want in fields where they otherwise could not. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #4828

Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #4828

When generation mismatch, it usually means the file pointed by the file handle was deleted. We should return ESTALE to indicate this. We return ENOENT in zfs_vget since zpl_fh_to_dentry will convert it to ESTALE. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #4828

behlendorf · 2016-07-12T22:57:26Z

Merged as:

6c25306 fh_to_dentry should return ESTALE when generation mismatch
d470101 Add configure result for xattr_handler
bffb68a Fix Large kmem_alloc in vdev_metaslab_init
7938c2a Don't allow accessing XATTR via export handle
061460d Fix get_zfs_sb race with concurrent umount